A Multiform Balanced Dependency Treebank for Romanian

نویسنده

  • Mihaela Colhon
چکیده

The UAIC-RoDia-DepTb is a balanced treebank, containing texts in non-standard language: 2,575 chats sentences, old Romanian texts (a Gospel printed in 1648, a codex of laws printed in 1818, a novel written in 1910), regional popular poetry, legal texts, Romanian and foreign fiction, quotations. The proportions are comparable; each of these types of texts is represented by subsets of at least 1,000 phrases, so that the parser can be trained on their peculiarities. The annotation of the treebank started in 2007, and it has classical tags, such as those in school grammar, with the intention of using the resource for didactic purposes. The classification of circumstantial modifiers is rich in semantic information. We present in this paper the development in progress of this resource which has been automatically annotated and entirely manually corrected. We try to add new texts, and to make it available in more formats, by keeping all the morphological and syntactic information annotated, and adding logicalsemantic information. We will describe here two conversions, from the classic syntactic format into Universal Dependencies format and into a logical-semantic layer, which will be shortly presented.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Romanian Dependency Treebank

The Romanian Treebank was created with manual and automatic manually checked annotation. The syntactic relationships were meticulously defined. We aim to affiliate our Treebank to Universal Dependencies, in this way some categories would become subclassifications. For the creation of this Treebank, we have built an annotation interface and a Romanian language dependent parser that works with st...

متن کامل

An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies

A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...

متن کامل

Universal and Language-specific Dependency Relations for Analysing Romanian

This paper is meant as a brief description of the Romanian syntax within the dependency framework, more specifically within the Universal Dependency (UD) framework, and is the result of a volunteer activity of mapping two independently created Romanian dependency treebanks to the UD specifications. This mapping process is not trivial, as concessions have to be made and solutions need to be foun...

متن کامل

تبدیل خودکار درخت‌بانک وابستگی فارسی به درخت‌بانک سازه‌ای

There are two major types of treebanks: dependency-based and constituency-based. Both of them have applications in natural language processing and computational linguistics. Several dependency treebanks have been developed for Persian. However, there is no available big size constituency treebank for this language. In this paper, we aim to propose an algorithm for automatic conversion of a depe...

متن کامل

تولید درخت بانک سازه‌ای زبان فارسی به روش تبدیل خودکار

Treebanks is one of important and useful resource in Natural Language Processing tasks. Dependency and phrase structures are two famous kinds of treebanks. There have already made many efforts to convert dependency structure to phrase structure. In this paper we study an approach to convert dependency structure to phrase structure because of lack of a big phrase structure Treebank in Persian. A...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017